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Abstract The phenomenon of protein synthesis has been modeled in terms 
of totally asymmetric simple exclusion processes (TASEP) since 1968. In 
this article, we provide a tutorial of the biological and mathematical aspects 
of this approach. We also summarize several new results, concerned with 
limited resources in the cell and simple estimates for the current (protein 
production rate) of a TASEP with inhomogeneous hopping rates, reflecting 
the characteristics of real genes. 

Keywords Protein synthesis • TASEP • nonequilibrium statistical physics 



1 Introduction 

Nonequilibrium statistical physics remains one of the greatest unsolved chal- 
lenges of theoretical physics. In recent studies, both the National Academy of 
Sciences 78 and the Department of Energy 117j have recognized the impor- 
tance and scientific impact of developing a fundamental and comprehensive 
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understanding of physics far from equilibrium. Unlike string theory or cos- 
mology, this field addresses phenomena in our immediate experience, such 
as the flocking of birds or fish (4TJII9TI I6"] , traffic flow [TB1[TS1I5T| , or biological 
transport [531 84 , 58 , 43 , 69 ,1)8] . to mention just a few examples. Living sys- 
tems and biological phenomena, in particular, are areas where concepts and 
methods from nonequilibrium statistical physics find a natural application. 
On the one hand, nonequilibrium statistical physics is concerned with open 
many-particle systems, sustaining nontrivial currents of energy or particles; 
and on the other hand, biological systems are characterized by considerable 
complexity and depend on energy and matter throughputs for proper func- 
tionality. Of specific interest are biological transport phenomena restricted 
to an effectively one-dimensional track, such as kinesin and dynein on mi- 
crotubules, RNA polymerase on DNA during transcription, and ribosomes 
on mRNA since these offer the promise of being accessible via simple model 
systems. The last process, usually referred to as translation, is a key element 
of protein synthesis, and will be our prime focus in the following. 

One possible approach is to start with a good understanding of the phys- 
ical and biological processes involved in protein synthesis, so that a detailed 
model can be developed and tailored to a particular experimental situa- 
tion. While this promises the possibility of quantitative comparisons between 
model results and experimental data, any underlying generic characteristics 
cannot be easily identified, due to the large amount of experimental detail 
involved. An alternate approach - which has proven immensely powerful in 
equilibrium statistical mechanics - is to study simple model systems which 
can provide deep insights into generic behaviors and universal phenomena. 
Even though such simple models may not allow for immediate comparisons 
with experimental data, due to the considerable amount of simplification or 
abstraction involved, they can still guide future experimental work, espe- 
cially when good data are not yet available. In summary, both approaches 
have their strengths and drawbacks and offer complementary insights. 

In this article, we discuss one particular marriage between statistical 
physics and biology. We will approach a very complex biological process - 
protein synthesis - starting from a simple model, the totally asymmetric ex- 
clusion process (TASEP). In a twist of history, this model was first suggested 
in this context in the late 1960s [69,68 and endowed with as much experimen- 
tal detail as known at the time. Independently, models of stochastic directed 
transport were proposed and studied by mathematical physicists |92j . About 
two decades ago, these were (re-) discovered by the nonequilibrium statistical 
physics community |62j and became paradigms of this field, much like the 
Ising model for the study of phase transitions and critical phenomena. 

Our goal here is two-fold, namely, to provide a brief tutorial and to dis- 
cuss some recent developments. The next two sections are devoted to the 
former and the following one to the latter. Thus, Section [2] is designed for 
those new to biology, providing some basics of how proteins are synthesized 
in a cell and, more specifically, a part of the process known as "translation." 
Complimentarily, Section [3J is designed for readers who may be interested in 
getting involved with exploiting TASEP as a model for translation. Assum- 
ing they are familiar with the basics of statistical mechanics and simulation 
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techniques, this section is meant to provide a self-contained, though by no 
means comprehensive, review of the simplest TASEP [92] and generalizations 
relevant for translation. Section IV is devoted to two recent developments. 
The first is a brief review of competition between multiple TASEPs. In the 
second, we are motivated by "silent mutations" of genes to introduce a novel 
notion in statistical mechanics, namely, a quenched distribution of distribu- 
tions. We end with a brief summary and outlook in Section [5] 

2 Rudiments of protein synthesis 

Essentially every vital process within cells of living organisms involves pro- 
teins (a.k.a. polypeptides). These are macromolecules formed from chains of 
amino acids. For example, hemoglobin is a protein which carries oxygen from 
our lungs to the rest of our body. The blueprint for how these proteins are 
synthesized is contained in the DNA of the cell. Over the past century, bio- 
chemists gradually discovered the way this information is transformed into a 
physical molecule. Much of this body of knowledge is quite well established 
and can be found in many standard texts of microbiology [3J. For the reader's 
convenience, this Section provides a brief synopsis of this process. 

Protein synthesis involves two stages: transcription of genetic informa- 
tion from DNA to messenger RNA (mRNA) by RNA polymerase and trans- 
lation from mRNA to proteins through ribosome translocation. Articulated 
by F. Crick in [22ll23| . this central dogma of molecular biology captures the 
essence of transferring sequence information to functional macromolecules 
(e.g. RNAs and proteins) in all life forms. One of the most complex cellu- 
lar processes, protein synthesis demands concerted actions by hundreds of 
molecules in sequential steps and typically requires a high level of regula- 
tion. Its vast demand for the energy needed to complete the reactions also 
establishes its crucial role in all metabolic pathways. Therefore, developing a 
quantitative understanding of transcription and translation processes would 
be most desirable. Indeed, this task has dominated much of recent research 
in molecular biology, as well as mathematics, physics and emerging cross- 
disciplinary fields. Of course, any model that encompasses all the biochemical 
reactions and the structural components in translation will be prohibitively 
complex. Rather, it is more practical to attempt at gaining some insight into 
the process of transforming DNA information into a polypeptide chain by 
restricting our attention to single-cell organisms and identifying the most 
essential ingredients. To this end, we focus on the bacterium E.coli, a well- 
established model organism in molecular biology of which abundant genetics 
and kinematics data are available for further analysis. In this article, we 
will further restrict our attention to the second stage: translation. Our aim 
here is to condense the relevant biological information into a simple overview 
of protein synthesis, so that interested readers may actively participate in 
model-building/analysis. 

In most bacteria such as E.coli, translation involves three main play- 
ers: the mRNA (genetic template), the ribosome (assembly machinery), and 
aminoacyl transfer RNAs (aa-tRNAs), i.e., transfer RNAs "charged" with 
the corresponding amino acid. 
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The mRNA carries genetic information, encoded as triplets of nucleotides. 
Each triplet is known as a "codon" . Since there are four nucleotides (A, U, 
C, G), there are 4 3 = 64 distinct codons, e.g., AUG, CGG, etc. Except for 
three "stop codons" (UAA, UGA, and UAG) that signal the termination of 
translation, each of the remaining 61 "codes" for one of the 20 amino acids. 
Each of the latter is conveniently denoted by a single letter: A, C, D, E, F, 
G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y. Indeed, we may regard a 
protein as a word, i.e., a string of letters from the above list, ranging from 
~ 10 to ~ 1000 long. With 61 codons and 20 amino acids, there can be, on 
the average, three codons associated with the same amino acid. As a result, a 
protein with L amino acids can be coded by ~ 3 L distinct sequences (strings 
of codons or mRNAs). To complicate matters, some amino acids are coded 
by a single codon (e.g. AUG for methionine/M and UGG for tryptophan/W) 
while others are associated with as many as 6 (e.g. CGU, CGC, CGA, CGG, 
AGA, AGG for arginine/R). Therefore, the exact protein- mRNA degeneracy 
depends on the sequence (though not the precise order of codons). Codons 
coding for the same amino acid are termed "synonymous." In the example 
above, CGU and AGG are synonymous codons. Two sequences which differ 
only by synonymous codons are known as "silent mutations," in the sense 
that both produce the same protein (polypeptide chain). In addition to the 
string of codons, there is a long sequence of nucleotides at the beginning 
of the mRNA, known as the "Shinc-Dalgarno sequence" (SD sequence) [55] , 
This region controls the binding of a ribosome to the mRNA, which is also the 
start of the process of translation. Known as "initiation" in biology, even this 
starting event is quite complex [61,73,77,72,79,66,5 , requiring the presence 
of several initiator proteins. 

The next major player is the ribosome, a sizable molecule composed of 
a large and a small subunit. Within the ribosome, there are three sites to 
which a tRNA can bind and unbind. Designated as A, P, and E, these are, 
respectively, the aminoacyl site (for docking of an aa-tRNA), the peptidyl site 
(for transferring and binding the newly arrived amino acid to the partially 
formed polypeptide chain) and the exit site (for releasing the tRNA). 

The process of translation consists of ribosomes moving along the mRNA 
without backtracking (from one end to the other, technically known as the 5' 
end to the 3' end) and is conceptually divided into three major stages: initia- 
tion, elongation and termination. Among the three steps, initiation is of the 
highest complexity and has seen significant developments in unravelling its 
molecular details [6T1I731I771I721I79II66| [5] . Here, the ribosome interacts with 
the SD sequence through complementary base-pairing and locates the start 
codon AUG with the help of several initiator proteins. Translation begins 
with the assembly of the two subunits of the ribosome, along with a tRNA 
charged with the M amino acid in the A-site. The next steps, elongation and 
termination, are also quite intricate [82,90,9,24,76,42,75,57 . The ribosome 
moves along the mRNA, "reading" codon by codon, recruiting the appropri- 
ate aa-tRNA, "knitting" the latest amino acid into the partially completed 
chain, and releasing the "discharged" tRNA. This cyclic process consists of 
the following steps, (i) The last amino acid of a partial chain is attached 
to its tRNA at the P-site, aligned with a certain codon. (ii) An aa-tRNA, 
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correctly matched with the next codon, docks at the A-site. (iii) The peptide 
bond between the amino acid and the tRNA at the P-site breaks and reat- 
taches to the new amino acid at the A-site. (iv) The ribosome moves forward 
so that the two tRNAs are now at the E- and P-sites. (v) The discharged 
tRNA in the E-site is released, leaving the A-site empty for the next aa- 
tRNA. Finally, when the ribosome encounters one of the three stop codons, 
the termination process commences: The ribosome disassociates, while the 
completed amino acid chain is released (and folds into a functioning protein) . 
The whole process is quite involved and, instead of providing a figure here, 
we direct interested readers to one of the many helpful animations on the 

www pig. 

Obviously, the third major set of players are the tRNAs. One end of 
this class of molecules consists of one of the many anticodons (e.g., UAC, to 
match with AUG0 The other end is an acceptor stem, to which an appropri- 
ate amino acid (one of 20) can be attached, forming a "charged" aa-tRNA. 
Normally, the mapping from amino acids to anticodons is one-to-many; de- 
tails may be found in, e.g., |96) . For our purposes, the main concern is the 
rather dissimilar set of degeneracies, i.e., the number of synonymous codons, 
m aa , for the amino acid aa: 
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(1) 



Note that 3 codons are reserved for termination, so the total here is only 61. 
Meanwhile, the concentrations of these aa-tRNAs in a typical cell are known 
to be far from uniform. Indeed, for E.coli, the relative abundance can be as 
much as a factor of 15 [33ll95lf54] . Since the elongation rate is believed to 
be correlated with the aa-tRNA availability (as a ribosome must "wait" for 
the appropriate aa-tRNA to arrive before proceeding) [33,67,90,64 , the time 
it takes a ribosome to complete translation can vary widely, depending on 
the codon sequence. Further, the rate of protein synthesis depends not only 
on the speed of a single ribosome, but also on how many ribosome may be 
translating simultaneously (on the same mRNA). Therefore, the wide range 
of degeneracies shown in ((J} implies that silent mutations may have serious 
implications for protein production rates. 

Finally, for a cell to function properly, the important quantities to be 
controlled are, presumably, the levels of various proteins. In a steady state, 
a protein's level depends on the rate of its degradation as well as its produc- 
tion. Assuming the former is the same for all proteins, then the concentra- 
tion of any particular type follows its production rate closely, and the latter 
is just the average current associated with translating that mRNA. Thus, 
our main interest here is the following question: How are protein production 

1 Simplistically, there would be 61 anticodons to match the 61 codons. However, 
nature is more complicated. Most cells contain less, due to "wobbling." For E.coli, 
there are 46 distinct anti-codons. We will ignore this extra complication here and 
discuss translation as if there were 61 anticodons. 
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rates correlated with specific sequence information? Of course, translation 
in real biological systems is much more complex, involving subtleties such 
as prokaryotes vs. eukaryotes, initiation and elongation factors, wobbling, 
signaling, and regulation in response to external conditions, etc. All are be- 
yond the scope of this article, as we focus on a few key ingredients, build 
the simplest models, explore their behavior, and make some inroads into the 
remarkable processes of life. 

3 The TASEP and its generalizations 

In this section, we turn to another topic: the totally asymmetric simple exclu- 
sion process (TASEP), which has been exploited to model protein synthesis 
for over 40 years. After a brief historic introduction, we review recent progress 
on simple models and generalizations to account for more realistic conditions 
in biological systems. 

3.1 Early history: Studies by Gibbs, et.al. and Spitzcr 

Around 1970, TASEP was introduced from two entirely distinct motivations, 
one from biology and the other from mathematics. Not surprisingly, the two 
approaches are also quite different. In this subsection, we devote brief para- 
graphs to each. 

Gibbs and collaborators [68 , 69 were interested in a quantitative descrip- 
tion for translation, the process reviewed briefly above. The mRNA, the 
codons, and the ribosomes are modeled by, respectively, a one-dimensional 
open lattice, its sites, and particles (which enter the lattice at one end, hop 
unidirectionally from site to site, and exit at the other end). Aware of the 
large size of ribosomes compared to codons, these workers began with par- 
ticles which "cover" £ > 1 sites. Further, their formulation allowed for the 
possibility that particle hopping rates can be bidirectional, as well as being 
site- and time-dependent. Setting up the full stochastic problem, with a mas- 
ter equation for the probability to find particles at each site, is relatively easy. 
However, no one has been able, so far, to solve such a complex problem, even 
for the steady-state distribution with time-independent rates. Nevertheless, 
there has been significant progress since 1968, on various fronts. Historically, 
Gibbs, et. al. focused on a system with uniform and time-independent rates, 
ignored some correlations between ribosomes yet accounting for the strict 
exclusion due to £ > 1, and set up recursion relations for pi, the density 
of particles at site i (i.e., the average occupation of site i by, say, the trail- 
ing edge of a particle) , with the current J as a to-be-determined unknown. 
Their analysis of these recursion relations, using both analytic and numerical 
techniques, led them to several important conclusions. Examples include a 
non-trivial ^-dependent relationship between J and the overall density p, as 
well as the existence of different phases as functions of the entry/exit rates. 
Since their model is too far from real biological systems and experimental 
techniques in biology were too primitive to probe occupations at the molec- 
ular level, their results have lain largely dormant until recent years. 
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At the other front, Spitzer was interested in Markov processes involving 
interacting particles [92]. The simplest interaction would be just exclusion, 
i.e., each site can be occupied by at most one particle - thus the "simple 
exclusion" part of TASEP. The simplest system would be a one-dimensional 
periodic lattice, i.e., a ring of L sites. Placing N particles on such a ring and 
letting them hop randomly to nearest-neighbor sites obviously leads to "sim- 
ple diffusion." Even in such a minimal system, highly non-trivial behavior 
can be found [31] , but we will focus on a dynamics more closely resemble that 
of protein synthesis, namely, hopping is allowed only in one direction. Thus, 
we turn to the "totally asymmetric" part of TASEP. Though the steady state 
is still simple, its dynamics displays even more intricate properties 26,65,70, 
|4"W2~9,56,44,45]. Of course, Spitzer also considered lattices with open bound- 
aries and particles hopping on/off the lattice at the ends. But this problem 
was sufficiently more complex than the ring that even its steady state distri- 
bution remained elusive for another two decades. In the next subsection, we 
specify Spitzer's model and briefly summarize some of the progress since the 
early 90's. 



3.2 The proto model and its properties 

In this model, the sites of a one-dimensional lattice, labeled by i — 1,...,L, 
may be occupied by — or 1 particle, so that a configuration of the system 
is specified by {rii}. Starting from some initial configuration with N < L 
particles, the rules of evolution are the following. At each update step, a site 
is randomly chosen and, provided it is occupied and the next site, located 
in the selected direction, is empty, an attempt to exchange the particle-hole 
pair is made. With probability 7, the attempt is termed "successful" and 
the particle is moved to the next site. For the process on a ring, site L is 
connected to site 1. Then, a Monte Carlo step (MCS) is defined as L attempts, 
so that, on the average, each site will be chosen once in a MCS. This way, 
it makes sense to compare systems with very different L's, but run for the 
same number of MCS. For the open TASEP, two additional rules apply: Site 
1, if empty, is filled with probability a and a particle on site L leaves the 
lattice with probability f3. An MCS in this case consists of L + 1 attempts. 
For the rest of this article, we will focus on the open case, since it resembles 
the process of protein synthesis. Schematically, its rules are summarized in 
Fig-UJ The central question is: What is P ({rii} , t), the probability for finding 
the system in configuration {rii} after t attempts? A master equation for P 
can be easily written (for open TASEP): 

P (K> , t + I) - P (K} , t) = £ HO > Kl) P (W , t) (2) 

{».} 

where 



C (K> , H» ee W (K| , {n t }) - S ({nU , {n t }) W ({nj , {<}) 

R} 
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i = 1 2 



Fig. 1 Schematic summary of the rules of the proto TASEP. Particles cannot hop 
backwards or onto an occupied site (red online). They move forwards to empty 
sites with rate 7 (green online). Entry and exit rates are denoted by a and ft, 
respectively (green online). 



is known as the Liouvillian (which plays a role similar to the Hamiltonian in 
quantum mechanics) and S is the Kronecker delta. Here, W {{n'i\ , is 
the transition probability from {n^} to {n^} 



1 



L + l 

L-l 



a (1 - ni) 6 (n[,ni + 1) TT 5 (r^,i 



7?tfc (1 - rik+i) S (n' fc , nfc - 1) S (n' fc+1 , nu+i + l) \\ 8 (n'j 



V n 3) 



k=l 



+ (in L 5 (n' L ,n L - 1) JJ <5 (n^-,i 

3<L 



(3) 



where the changes n — > n ± 1 are explicitly displayed. However, finding the 
solution to © is far more difficult. A simpler question is: What is the sta- 
tionary distribution, P* {{n-i}), assuming the system settles into such a t- 
independent state at large times? Once it is known, other natural questions 
arise: What are the macroscopic properties of the system in this state? Of 
particular interest is how a and /? control the averages of observables, 

{ni} 

such as the density profile pi = (tii). Once we have p^, other quantities of 
interest can be computed. In particular, we will be mostly interested in the 
overall density, p = pi/L, and the average current, J = f3pL (i.e., the aver- 
age number of particles that enters/exits the lattice in a Monte Carlo step). 
The answers, some known to Gibbs, et. at, became more well-established 
over the last two decades. Setting 7 to unity (without loss of generality), the 
system can be found in three distinct phases in the a-/3 plane - a half-filled 
phase with maximal current and high/low density phases [62], denoted by 
MC, HD, and LD respectively in Fig. [2 
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Fig. 2 Phase diagram of the proto TASEP. Maximal current and high/low density 
phases are denoted by MC, HD, and LD respectively. The transition between the 
MC phase and the other two (dashed lines) is continuous. The transition across the 
HD-LD boundary (solid, red online) is discontinuous. 



Transitions from the maximal current (MC) phase to the other two phases 
(HD/LD) are continuous, and display critical behavior similar to second order 
phase transitions in equilibrium. Indeed, critical properties, such as algebraic 
decaying correlations, can be found in the entire MC phase. Across the HD- 
LD boundary, the transition is discontinuous, and, on the line itself, the sys- 
tem displays coexistence of HD and LD. Specifically, the HD-LD regions are 
macroscopic, separated by a microscopic interface, referred to as a "shock." 
As in many equilibrium systems with coexistence, such an interface can be 
located anywhere. In TASEP, the shock performs a random walk (reflected 
only from the ends), so that the average density profile is linear in i, interpo- 
lating between phd and pld- In the literature, this line is often referred to 
as the "shock phase" (SP). Setting up a phenomenological theory for the be- 
havior of this shock, known as domain wall theory, several authors have been 
successful in predicting many properties of TASEP outside the MC region [7J 
[83]. The exact P* ({m}) was found , from which J(a,/3) and 

p (a, j3) can be computed analytically for all a, /3. In the L — > oo limit, these 
are remarkably simple: J = p (1 — p) always, while p — {1/2, 1 — ft, a} in the 
MC,HD,LD phases, respectively. Thus, J < 0.25 in general. More recently, 
considerable progress was made using the powerful Bcthc-Ansatz 59,38, 74, 
I5^fl51l2"5"ll%0"] . so that the complete spectrum and all the eigenvectors of C 
are accessible. Consequently, some of the more complex, dynamic properties 
of TASEP are also exactly known. Details of this large body of results are 
beyond the scope of this article. The interested reader may consult several 
comprehensive reviews such as [4MTT1I2"?] . 

Despite this comprehensive knowledge of C, there are seemingly simple 
questions about this system for which simple answers are not available. An 
example is the power spectrum associated with N = m, the total number 
of particles on the lattice. Specifically, we record a time series N (t) over a 
run and construct its Fourier transform, TV (w). Carrying out many runs and 
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taking the average, |...|, the power spectrum is 



I(ui) ee \N(cj) 

Note that this average contains information on the dynamics and is therefore 
not related to the static average, (...), above. If the runs are taken when 
the system is in the steady state, then I (0) is, of course, known: (pL) . But, 
I (lo > 0) displays more interesting behavior, such as oscillations (in uj) in the 
HD/LD phases [2ll20j. Although the physics behind these is understandable 
and approximate theories provide reasonable fits, an exact analytic formula 
is not known (except formally^!- In the remainder of this article, we will 
look beyond this proto model and focus on generalizations which take into 
account some other essential ingredients in the process of protein synthesis. 

Before continuing, let us point out an equivalent formulation of the open 
TASEP, but based on a ring. Conceptually simpler and essentially used in 
simulations, this version will appear to be most natural in the contexts to 
be presented below. Here, we consider a periodic lattice with L + 1 sites 
filled with a total number, Ntot, of particles. Considering the role it plays, 
we will refer to the extra site, i = 0, as the "reservoir" or the "pool."0The 
rules associated with this site are, of course, quite different from those in the 
bulk: (i) It has unlimited occupation, so that we are guaranteed uq > 1 by 
imposing N to t > L + 1. (ii) If it is chosen for updating, one of its particles 
is moved to i = 1 with probability a (1 — n\). (iii) If site L is chosen and 
til = 1, the particle hops into the pool with probability /3, regardless of no- 
By denoting a, f3 as 70, 7l, we may regard them as part of a full set of site- 
dependent hopping rates {7^}. Incorporating the special rules for site 0, we 
can replace [...] in ([3]) by a succinct expression 

L L 

y^7fc i n k + 4 : o (1 - n )} [1 - n k+1 + 6 kyL n L ] Y\S (n'^rij - 5^ k + Sj,k+i) ■ 

fc=0 j=0 

(4) 

(with Jik+i = no, etc.) Note that this IljS includes all the possible changes 
in {rij}. To re-emphasize, N to t = no + N is conserved in this formulation. 
However, as long as N to t > L + 1, the properties of the open TASEP above 
are identical to the i € part of our ring and can fluctuate in the 

range [0, L] as before. 

3.3 Generalizations of TASEP 

As noted above, Gibbs, et. al. 68,69] were aware of the size of a ribosome 
compared to a codon, so that Spitzer's simple TASEP must be generalized 

2 In this case, the difficulties lie mainly in computing the average of nonlocal (in 
both space and time) operators. 

3 The notion of particle reservoirs was used in the literature, with one major 
difference. Unlike here, open TASEPs were coupled to two unrelated reservoirs, 
one at each end. 
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Fig. 3 "Complete entry, incremental exit" rule for an £ — 3 case. The gray dots 
(green online) denote the "readers." Since the reader of the second particle is on 
i = 4, it will hop with probability 74 



to having "particles" which extend over £ > 1 sites. Indeed, from the latest 
data, £ ~ 12 seems to be the most appropriate [52,69,68 . This generaliza- 
tion requires some modifications to the rules. Since the ribosome appears to 
"read" the codon over its A-site, it is most natural to associate one particu- 
lar part of the extended particle with the "reader" [55]. After some thought, 
it is also clear that, as far as TASEP is concerned, which part is labeled 
the reader is irrelevant. For convenience, we choose the reader to be at the 
trailing edge of the particle [55]. To "read" the first codon (site 1 on the 
lattice), the ribosome/particle must enter the lattice and for that to occur, 
the first £ sites must be empty. On the other hand, while the ribosome is 
"reading" the last £ codons, it must be the last particle on the lattice, with 
no others to impede its progress. Therefore, it can move without hindrance 
toward the exit end. The new set of entry/exit rules is known as "complete 
entry, incremental exit" [T5J.(See Fig. [3] for an illustration.) 

These seemingly modest changes of the rules have profound consequences. 
At the simplest level, we must now distinguish between particle (ribosome) 
density and "coverage" density (number of sites "covered" by a particle per 
unit length). Denoting the former by p r and the latter by p, we see that the 
overall densities differ by a factor of £ (p = £p r ), and the two profiles are 
related by 

£-1 

Pi = p i~k ■ 
k=0 

If p h = 1 — p denotes the hole density, we also have p r + p h < 1. Although 
the phase diagram and the current-density relation J (p; £) are qualitatively 
unchanged [88 ? I3,34 , an exact solution (for P* or J or p) remains elusive. 
Stationary profiles are much more seriously affected, especially in the HD 
phase [34]. On the other hand, for a homogeneous TASEP on a ring, P* is 
known to be uniform [92], so that an exact J(p;t) can be derived [55, 13j^l. 

4 Other exact results (also for a collection of particles with different Ps) have 
been found recently. See e.g., [4]. 
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To be precise, we denote the particle current by J, but write it in terms of p 
instead of p r : 



A more elegant version of this formula is 

-l 



(6) 



in which the second term accounts for steric hindrance somehow. On the 
left is the average time between successive particles (with exclusion) exiting 
the lattice. On the right, we have the sum of such times for non-interacting 
particles and holes. This connection is quite remarkable. 

Returning to cqn. ([5]), we see that J (p) still rises from zero, reaches 
a maximum, and returns to zero. However, its upper bound is lowered to 

(l + Vtj , i.e., by O {£) for large £ §MIMM- while this J {P\ £ ) [t is not 
rigorously the same as the one in an open TASEP, it can be argued that, in 
the L — > oo limit, the two should be the same. As noted above, Gibbs, et. al. 
arrived at the same J (p; i) long ago, by accounting for some effects of the 
^-exclusion approximately. This is one of the few reasonably well understood 
aspects of TASEP with extended objects. In passing, we should mention that 
TASEPs with polydispersed particles on a ring have also been studied [3], 
though their relevance to protein synthesis seems remote. 

A second essential aspect of our problem was also recognized by Gibbs, 
et. al. |68U69] . namely, site-dependent hopping rates, i.e., inhomogeneous 
TASEPs. In Section [21 we indicated the rationale for considering such a dif- 
ficult problem: non-uniform aa-tRNA abundance. Needless to say, it is pro- 
hibitively difficult to determine quantities like J and p for a TASEP with an 
arbitrary set of rates, {7i}. Even when restricted to point particles (£ = 1) 
on a homogeneous ring, the introduction of a single "defective site" (with 
7 ^ 1 but no changes to the rules of exclusion) renders the problem in- 
solvable (i.e., no exact P*) so far. The non-trivial consequences and serious 
challenges were noted as early as 1992 [5 5) Fl With several defects, systematic 
studies become less manageable, even with approximate or numerical meth- 
ods. For an open TASEP with a few defect sites, progress was made mainly 
with Monte Carlo simulations, while some understanding is possible by ex- 
ploiting mean-field approximations of various levels |60U41[[5rj] . Most relevant 
to modeling translation is the discovery that the current (for a = (3 = 1) de- 
pends on the location of slow defects (7 < 1) 14, 35, 34, 41 .,36]. In particular, 
if there are two slow sites in the bulk, the distance between them affects J 
seriously 14 . 35 , 34 , 41 . This implies that protein production rates can be sig- 
nificantly suppressed if codons associated with rare aa-tRNA's are clustered 
in the gene. At the other extreme, several groups studied TASEPs with a 
full set of quenched random rates, {7^}, each of which is chosen from from 
some distribution (e.g., Gaussian, two-valued, etc.) 94,8,63,5"Tll4"T]. These 
authors considered only point particles and focused on the effect of disorder 

5 Remarkab ly, exact results are available if a single particle hops more "slowly" 

E5EZE5CI1- 
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on the (quenched average) current-density relation. Using simulations and 
mean field approximations, J (p) is found to develop a plateau in a region 
around p = 1/2, details of which depend on the variance of the distribution 
of the inverse rates: I/7. The phase diagram remains qualitatively the same, 
with three phases that resemble MC, HD, and LD. Not surprisingly, the 
main effect is that the transitions are no longer sharp. Beyond these studies, 
disordered TASEPs with I > 1 are yet to be explored. 

To model protein synthesis more realistically, we need a combination of at 
least three ingredients: (a) open boundary conditions, (b) extended objects 
(say, i = 12), and (c) inhomogeneous rates, {-fi}- As we noted from the 
historic perspective, it took some time to arrive at full solutions - for TASEPs 
with ingredients (a) or (b). Yet, here we would prefer to include all three 
aspects and ask for, at the least, the average current J (a, /3, {7^} , t) and the 
overall density p (a, /?, {7^} , £). Clearly, this program is extremely ambitious, 
even if we restrict our investigations to the Monte Carlo approach. In Section 
14.21 we will present a very simple, yet reasonably reliable, method to arrive 
at a good estimate for the current. 

4 Some recent developments 

In this section, we present two topics where some recent progress was made. 
We begin (Section 14. lp with an analysis of a particular instance of the cell 
having limited resources available. In TASEP language, we are exploring how 
a TASEP is affected by having a finite reservoir of particles. The effects of 
several TASEPs competing for the same pool of particles [Tl ll9l[2Tj will be 
also presented. The rationale behind such pursuits is that a cell has thou- 
sands of copies of thousands of different types of mRNAs, competing for the 
same pool of ribosomes. Do some "win" while others "lose" ? In the language 
of TASEPs, since we model an mRNA by a sequence {74}, we will be inter- 
ested in J ({ji} ; N to t), namely, how the current associated with this sequence 
depends on N to t, the total number of particles in the pool. Our analysis here 
will be restricted to homogeneous TASEPs (<ji — 1) of differing lengths. 

Section 14.21 IVb will be devoted to our search for a simple ( "quick and 
dirty" ) way to estimate J (a, (3, {ji} , £) for a single TASEP but with a fully 
inhomogeneous sequence of hopping rates, {7^} . This search leads us to a 
novel form of quenched randomness, which we namec|l "distribution of dis- 
tributions." Recall that a protein is a fixed sequence of L amino acids and 
can be coded by O (e L ) different mRNAs. Suppose we wish to synthesize 
an artificial protein consisting of only R's. We can use any one of the 6 L 
possible mRNAs, each of which corresponds to a realization of a quenched 
random sequence of codons, chosen from a single distribution of 6 values 
(corresponding to CGU, CGC, CGA, CGG, AGA, or AGG in this example). 
This procedure is standard for problems involving quenched disordeiQ- How- 
ever, in a naturally occurring protein - the "wild type" - the L amino acids 
will be different, so that the sequence of degeneracies will be non-trivial and 



6 This notion was presented in |98j . 

7 See, e.g., [3911971137] . etc. 
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fixed (e.g., 4266224 for the amino acid string PQLRFEV). Thus, instead of 
choosing codons from a single distribution to construct all possible mRNAs 
(as in the artificial RRRRR case or in all previous studies of quenched dis- 
order), they must be chosen from a fixed sequence of different distributions. 
Pursuing these ideas further, we discovered a remarkable fact about E. coli. 
Simulating 5000 randomly chosen sequences for each of 10 specific genes, 
we find that the average currents lie in a narrow range (within 25% of each 
other). However, the currents associated with the wild types typically lie very 
far above the average. These are intriguing findings, from the perspectives of 
both, the statistical physics of quenched disorder and the specific realization 
"chosen" by the living organism. 

4.1 Competition for ribosomes: TASEP with finite particle reservoirs 

In a living cell, ribosomes are constantly synthesized and degraded. On the 
other hand, it is believed that some are also "recycled," i.e., after termination 
in translating one gene, the subunits reassemble to translate another gene. Of 
course, there are multiple facets to "ribosome recycling." Chou considered the 
enhancement of initiation rates on a gene due to the proximity of a ribosome 
which unbinds from the same mRNA |T3]. We consider a different aspect. 
Ignoring synthesis/degradation, let us model the number of ribosomes in a 
cell by a constant, N tot , to be shared by all the genes. Then we ask: What 
is the effect of multiple TASEPs competing for a single pool with a finite 
number of particles? As a base-line study, we first focus on the effects of 
finite Ntot's on just one homogeneous TASEP (Tl [TW2"0"] . For example, we seek 
p(Ntot), the dependence of the overall density on the total particle number. 
This study is then extended to include multiple (homogeneous) TASEPs [21] 
with possibly different i's. Will the overall densities and currents be the 
same or different? If the latter, how are they controlled by N to t^ So far, all 
studies are based on point particles and uniform entry /exit rates. 

For these investigations, it is clear that the alternative representation of 
an open TASEP in Section 13.21 is most natural, with no being the number 
of particles in the pool. In the single TASEP case, novel behavior already 
arises when we introduce only one modification: allowing N to t to be lowered 
below L + 1. Ha and den Nijs coined this the "parking garage problem" and 
provided many interesting results |50j . To model how translation might be 
affected by the scarcity of ribosomes, we let the binding rate of a ribosome to 
the mRNA, 70, depend on the ribosome concentration. In particular, when 
the ribosome concentration is very low, we let 70 be proportional to it. At 
the opposite extreme, it should have no effect on 70, which should take on 
some intrinsic value - denoted by a - associated purely with the binding 
kinetics. In our model, no is proportional to this concentration, so that we 
simply choose a convenient 70 (no) which interpolates between and a. In 
all the simulation studies [T UT51I2T)] . we have 

7o (no) = atanh(no/AH (7) 

where N* is some crossover parameter (chosen to be O (L) for convenience). 
By contrast, the exit rate should not be affected by no, so that we simply 
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have 7l = P- Since N to t — n + N, both n and N will be small as N to t is 
increased from 0, and the system first finds itself in the LD phase. At the 
other extreme, no is necessarily large as well (since N < L), so that we will 
arrive at an ordinary open TASEP associated with (a, j3). A crossover occurs 
when N to t reaches O (N*) = O(L). As may be expected, the LD-LD and 
LD-MC crossovers are uneventful, since no discontinuities are encountered. 
The response of the TASEP can be well approximated by a self-consistent 
equation for p(N to t) ■ 

p = l0 = atarih{{N tot -pL)/N*) (8) 

More interesting is the LD-HD crossover, since it spans a discontinuous 
boundary. The response is well described by the following. Raising Ntot from 
0, the average density is given by the above equation, until a critical value, 
iV t ~ t = j3L + N* tanh" 1 (P/a), is reached. Lowering N to t from oo, p remains 
at the HD value of Phd = 1 — y9, until Ntot reaches another critical value: 
N+ t = (1 - P) L + N* tanh -1 (/3/a). Between Nf ot and N+ ot , all increases in 
Ntot are absorbed by the TASEP (while no and 70 stay constant). Thus, p 
rises linearly: p (N to t) — P + {N to t ~ ^tot) L. Such a response has an analog 
in equilibrium first order transitions, corresponding to, e.g., the linear section 
in an isotherm in the P-V diagram of a binary mixture. Furthermore, the 
average profile (pi) in this regime is also noteworthy. Instead of being linear 
in i (as in an unconstrained TASEP), it resembles a stationary shock. The 
underlying physics is understandable: The feedback from the pool prevents 
the shock from wandering throughout the lattice. Instead the shock is local- 
ized to a position controlled by N to t while its fluctuations are controlled by 
another detail of the feedback: djo/dno. Domain wall theory, so successful in 
providing good approximations for an ordinary TASEP, can be generalized to 
account for the feedback to give excellent "zero-parameter fits" to simulation 
data 19 . In passing, let us mention that even more remarkable structures 
appear in the special case of LD-SP (i.e., setting a = (3 and varying Ntot)- In 
all cases, the current displays no major surprises, mainly following the J (p) 
curve of an unconstrained TASEP. The interested reader is referred to [TO] 
for details. 

Next, we turn to multiple TASEPs and their competition for a finite pool 
of particles |21j . To model different genes and the many possibilities of regu- 
lation, we need (at the least) three parameters for each type (p) of TASEP: 
Lfj,, cap, Pp. A systematic study in the full parameter space of M TASEPs 
becomes quickly unmanageable, so ours is restricted to M — 2, 3. On the 
other hand, as an attempt at more realistic models, one unpublished study 
|18) simulated the competition of 10 genes from E.coli (with L's ranging from 
109 to 558; details in the next subsection). With I = 12 and the appropriate 
sets of 7i's, the only unrealistic part of this study is setting = 1 for all 
genes. Not surprisingly, for N to t ~ O (1), the currents are all the same, being 
controlled by the same small entry rate: 70. For N to t ^ 300, each TASEP is 
saturated in their MC-like phases, so that the currents differ by a factor of ~2. 
The approximate value of 300 can be expected from 10(genes)x400(typical 
length)/12(^). Meanwhile, crossovers occur at iV tot 's in the range of 100-250. 
The conclusion of this limited study is that, while the first attempt has been 
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made at the question of mRNA competition for ribosome in a real cell, much 
more remains to be explored before meaningful insights can be developed. 

Focusing on a more systematic (though less "realistic" ) study of competi- 
tion, we consider two TASEPs. The model here consists of two lattices with 
Lx,2 sites, joined at one site (site 0, the pool), so that it has the topology 
of two rings joined at one point. When site is chosen, with equal probabil- 
ity a particle attempts to move onto one of the two lattices. Once a lattice 
is chosen, it enters with the rates, 0:1,2- As usual, there is no exclusion at 
site 0, so that particles simply hop from sites L12 into the pool with rates 
ft-yp- For simplicity, we let ai.2 = a and pip = /? in this initial study }21| . 
Perhaps to be expected, when the two TASEPs are identical (i.e., L\ — L2), 
the symmetry is not spontaneously broken. The two response curves are the 
same, within statistical fluctuations. However, when the lengths are very dif- 
ferent, a new pattern emerges. In particular, for L\ = 1000 and L2 — 100, we 
find roughly five regimes in the LD-HD crossover. While the longer TASEP 
displays essentially the same behavior as in the single TASEP case (three 
regimes) , the shorter one experiences more variety (Fig. [4]) 

It is remarkable that, in the central section, p(N to t) for both are linear, 
with the shorter one being a constant! It turns out that the shock in this 
TASEP is delocalized, but acts as a control for the shock in the longer one. 
The motion of the two shocks is completely anii-correlated, so that no, the 
pool particle number, remains essentially fixed. As a result, the average den- 
sity profiles are quite different, being strictly linear for the short one. For the 
longer TASEP, the profile can be readily described by the profile of a single 
constrained TASEP, but "smeared out" over a distance of L2 (length of the 
shorter lattice). As can be seen from Fig. 4a, the generalized domain wall 
theory is quite successful at capturing all this novel behavior. Finally, these 
insights can be exploited to understand the behavior of three TASEPs in 
competition. Though not dramatically different, new features do appear, es- 
pecially in cases where the lengths are widely separated. For example, Fig. 4b 
is an illustration of p(N tot ) for L\ = 10, L2 = 100, and L3 = 1000. Many 
other results, such as remarkable properties of the stationary P* ({fc^}), the 
probability to find the domain wall located at site in lattice L^, are avail- 
able. Beyond the scope of this article, these may be found in [21]. Of course, 
we have taken only a minuscule step towards modeling competition in real 
cells. In addition of containing thousands of different genes (e.g., 5416 in 
one strain of E. coli), there can be thousands of copies of each type, not to 
mention that we should include the three essential ingredients pointed out 
at the end of Section [31 Finally, looking far ahead, we can consider the genes 
competing for finite pools of the 46 varieties of aa-tRNA, a problem involv- 
ing feedback from the details of the average occupation at each site. Clearly, 
this is a gargantuan task and much remains to be investigated before we can 
claim to understand competition for finite resources in a real cell. 

4.2 A simple estimate for currents in the inhomogeneous TASEP 

In this subsection, we return to a single open TASEP with extended particles 
(£ > 1) hopping along with a fully inhomogeneous set of rates {ji}- We will 
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Fig. 4 Average overall densities and currents as a function of N to t when two/three 
TASEP are competing for a single pool of particles. In both cases, a = 0.7 and 
f3 = 0.3. The currents in all cases follow approximately the same curve of the single 
TASEP case. Denoted by essentially indistinguishable solid and dashed lines (color 
online), they are marked by the call-out "Js". Simulations (solid symbols) and 
predictions from generalized domain wall theory (open symbols) of the densities 
are as follows, (a) L\ = 1000 (circles, blue online) and L2 = 100 (squares, red 
online), (b) L\ — 1000, L2 = 100, and L3 = 10 (triangles, green online). 



focus only on the average current J (a, f3, {7*} ; £). The task of predicting this 
J is clearly beyond our present analytic abilities. Faced with this impasse, 
one reasonable question is: Is there a simple way to estimate it? Of course, in 
the limit of a -C 7i>0i the particle density will be exceedingly low so that the 
particles are non-interacting, to a good approximation. Then, we simply have 
J ~ a, since the total time it takes a particle to traverse the lattice (mRNA) 
becomes irrelevant^. Note that the exclusion plays no role, so that t is also 
irrelevant. This consideration, based on the idea of the "worst bottleneck," 
can be used to provide the most naive estimate of J: 

Jworst b' neck 

(9) 

8 Actually, many exact results exist for the full stochastic problem of just a single 
particle hopping on such a ring. But we will not pursue this line further, since our 
main interest will be the many-body problem on open lattices. 
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where 7 m i n is the minimum in the set {7^}. Another possible estimate is to 
use the averages and variances of the entire set {Tj-" 1 }, which we denote by fj 
and crjj, respectively (following the notation in [ST]). These quantities proved 
quite successful in the analysis of quenched random averages of currents [941 
I551I5T] . However, there are limitations for both estimates when addressing 
issues of interest here, namely, finding a reasonably good estimate of the 
current for a specific sequence {74}. As in the estimate J ~ a, (O is useful 
only when the bottleneck is very severe. But, realistic rates are typically not 
so extreme that 7 m j n is drastically smaller than the rest of the 7's. More 
crucially, if we have more than one site with 7 m i n , then J will be affected by 
their locations. For example, studies of just two slow sites [T4ll3"4ll4T] showed 
that, having them as neighbors as opposed to being far apart, J can be lower 
by as much as a factor of 2. Indeed, if the sequence contains a consecutive 
string of k such sites with k ^> £, then we can regard this stretch as an open, 
homogeneous TASEP in its MC phase. The considerations around eqn. ([5]) 

then provide us with J ~ 7, n i n ^1 + . On the other hand, if these k 

slow sites are very far apart, then the estimate for J due to a single slow 
site [14,87,35,34,60^ (which reduces to J wor st b'neck, to lowest order in 7 m i n ) 
should suffice. Thus, the clustering of many slow sites indeed suppresses J, 
by as much as a factor of 20 for £ = 12. Similar limitations for the other 
estimate exist. Given a particular set {7*} (e.g., a real gene found in nature), 
we may compute fj and by assuming that this set is a good representative 
of the underlying distribution of 7's. Yet, neither of these quantities contains 
any information on the location of slow sites. Thus, we face quite a range of 
uncertainties when attempting to provide a good estimate. In the remainder 
of this article, we propose a rough and simple, yet tolerably reliable, estimate 
for J (a,/3,{7;}). 

Since clustering of slow sites appears to play an important role, our at- 
tempt is to consider a "coarse-grained" set of rates. In particular, we follow 
the notion introduced in [57) and define 

-1 

(10) 

The sum in this expression is recognizable as the typical time for a (free) par- 
ticle to traverse a stretch of s sites before site i. Thus, (K s ) i can be regarded 
as a "coarse-grained" rate associated with hopping from site i. Obviously, by 
setting s = i = L we recover a quantity that resembles fj, but our interest is 
more mesoscopic, e.g., s ~ £, since that would account for some of the effects 
of clustering of slow sites. Combining this notion with the idea of the bot- 
tleneck being the limiting factor, we propose that i^ jln i n , the smallest rate 
in {{Ke)^, can be exploited to give a good estimate for J. Note that we are 
not proposing J = K^ m i n , since the (maximal) current for a homogeneous 

TASEP would be Ke, mnl ^1 + , a value 20 times lower than -fQ.min in 

the case of i = 12! Instead, our hope is that a linear relationship J oc iQ >m in 
would be adequate. To be specific, let us focus only on I = 12 TASEPs with 
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Fig. 5 A 5-amino acid "designer gene" IWAMS with its associated degeneracies 
(m) in the second row. The third row shows explicitly the m synonymous codons 
and aa-tRNA cellular concentrations from [33]. The las t ro w are the corresponding 
hopping rates used in our simulations, defined in eqn. (|12ll . 



large entry/exit rates, i.e., a = ft = J ma x, an d use Monte Carlo techniques to 
find J (7 m ax, 7max, {7;} 12). To simplify notation, this average current will 
be denoted simply by J ({ 7 j})- Then, allowing a phenomenological slope, A, 
we will test how well 

J({ 7i }) ~AK 12 . min (11) 

is obeyed. 

Before describing the results of such a test, let us provide some details 
on the ensemble of genes we will use, as well as the concept of a quenched 
"distribution of distributions." As pointed out above, if we wish to synthesize 
a particular protein (a specific sequence of L amino acids: {aa^}), we can use 
the codes from Yli m ( aa i) different mRNAs, using the appropriate degen- 
eracies given in ([T}. To help the readers, let us provide a simple example: a 
fictitious L = 5 "protein chain," IWAMS, shown in the first row of Fig. [5] 
From (|TJ|, we see that the sequence of m's is 31416, shown in the second 
row. So, there are 72 (=3- 1-4- 1-6) possible "genes" which can code for this 
"protein." All the possible codons are shown in the third row, so all 72 can 
be read off, e.g., AUCUGGGCCAUGUCC. 

One natural question is: If these 72 possibilities are generated with equal 
probability, what is the distribution of the currents? To answer this, we must 
deal with another complication. Corresponding to each codon is an aa-tRNA. 
But, their relative abundances are not unique. Instead, many are the same 
(in E.coli, for a certain growth condition [33] ), as shown within [...] next 
to each codon in the third row. In our simulations, we normalized the hop- 
ping rate associated with the highest abundance - 29.35 - to unity, and 
so, in the fourth row, we list all the possible 7's. So, of the 72 possible 
"genes," there are only 10 (=l-l-2-l-5) distinct sequences of {7;}. There- 
fore, we should alert the reader to another complication when considering 
our ensemble of 72 (equally probable) "genes." Not all { 7 i}'s are equally 
probable, since there are only 10 possible distinct { 7 i}'s. As an illustration, 
the set {0.421,0.171,0.695,0.477,0.388} is three times more likely to occur 
as {0.421,0.171,0.122,0.477,0.388}, since 3 codons out of 4 coding for A 
have the same abundance, 20.97. Since average production rates (i.e., J's) 
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depend only on the sequence {7^}, the probabilities of any J occurring in our 
ensemble will not be uniform. 

With this illustration in mind, let us define our notations for an explicit 
formulation. 

— Let v = 1,2,... label the various proteins in a cell. Typically, there would 
be thousands. Below, we will study just 10 in E.coli: five highly expressed 
ones (dnaA, ompA, rspA, rplA, tufA) and five poorly expressed ones 
(araC, lamB, lad, secD, trpR). Each is a specific sequence of L u amino 
acids, which we denote by {aai} v ; i = 1, ■■■,L U . Of course, 1 < aa < 20, 
associated with the 20 alphabets in the first row of table (JXJ) . 

— To synthesize each 1/, there are M v distinct mRNAs (sequences of codons), 
which we denote as {ci} v . Let us label these sequences by ^ = 1, M v . 
Obviously, all sequences {ci} u has the same length as {aa,i} u so that 

= L v . Here, the variable c lies between 1 and 61, but Ci (its value at 
site i) is linked to the value of aa^ (via the aa-codon mapping). Recall that 
this mapping is one-to-many (l-m aa ), as the first three rows in table (fTJ) 

illustrate. Thus, M v = Yli=i m aai is large number, typically O (exp_L„). 

— Depending on the conditions in which a cell finds itself, different aa- 
tRNAs are found with varying concentrations (e.g., ref. [33J for E. coli). 
Following typical notation, we write [c] for the concentration of the aa- 
tRNA associated with codon c. As we see in the third row of table (fTJ, 
the c-[c] mapping is also often one-many. For simplicity, we assume the 
ribosome's hopping rate, from site i to i + 1, to be proportional to [c,]. 
Normalizing these rates so that unity is associated with the largest con- 
centration, [max], we use 

n = r^r (12) 

[max] 

for our simulations. 

With this framework in place, let us discuss our findings from performing 
the following simulations. For each of the 10 proteins shown above, we gen- 
erated 5000 sequences {ci} v with no bias, and compiled the associated 
accordingly. For each member in this ensemble, we computed i^i2,min and 
simulated the associated TASEP (with I = 12 particles) to obtain its current 
J ({7j}„). These pairs of values are plotted in the J-K plane. They generally 
form an elliptical cluster (5K indiscernible points, red online), as shown in 
Fig. [5] for each of the 10 proteins. 

Two other features appear in these plots: a dashed line and three points 
(stars), all being blue online. The lowest point corresponds to an "abysmal" 
sequence, formed by having the lowest allowed 7 at each site. Thus, it pro- 
duces the lowest possible (J,K). Similarly, the highest point is associated 
with an "optimal" sequence, with the highest possible (J, K) for this protein. 
The point in the middle is derived from the wild type (naturally occurring) 
sequence. Finally, the dashed line is the best linear fit through the three 
points, constrained to pass through the origin (J — K — 0). Remarkably, 
the 5K points lie reasonably close to the dashed line, giving us hope that the 
expression pip might be quite good. Before detailing quantitative aspects of 
the analysis, let us comment on a remarkable aspect of this data. From the 
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Fig. 6 Relation between J and K 12, mm for synthesizing 10 proteins in E.coli. 
Stars (*, blue online) are from the "abysmal", wild type and "optimal" sequences. 
The dash line (blue online) is the best linear fit through them and the origin. The 
elliptical cluster (red online) is from 5000 randomly generated sequences which code 
for the 10 indicated proteins. 



5K simulated J's, we compiled histograms to form a current distribution for 
each v. 

Shown in Fig. [3 no major surprises are apparent: All distributions seem 
normal, with means in the approximate range of 1.00-1.25 (for 100 x J) and 
standard deviations of 0.05-0.10. Their skewness and kurtosis (both unitless 
measures of deviations from pure Gaussians) fall in the ranges of, respectively, 
[—0.3, 0.3] and [—0.1, 0.4]. However, with even a casual glance at the panels in 
Fig. 6, the reader may notice that all but two of the wild types lie well above 
the cluster of 5K points. Indeed, five of them are more than 6.5 standard 
deviations above the mean. We can only speculate that natural evolution 
optimized the production rates! Work is in progress to study the rest of the 
5416 proteins in E.coli and, if this systematic deviation persists, to consider 
possible deeper underpinnings of this phenomenon. 

Returning to the more practical issue at hand, we seek a quantitative de- 
scription in an attempt to test expression (fTTj) . For a particular protein u, we 
consider an ensemble in which all M v sequences {ci} v are equally probable. 
However, as illustrated by the last two rows of table (JJ), each distinct {^i} v 
sequence can result from several {ci}^ sequences (depending on conditions 
on the cell, and other complications which we ignore here). Due to this com- 
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Fig. 7 Current distributions for 5000 TASEP sequences (modeling the silent mu- 
tations) which code for the 10 indicated proteins. 



plicated Ci-ji connection, there are far fewer {'Ji} v sequences, so that the 
distribution of 7's, which we denote by V v [{7*}], will not be trivial. Since J 
depends only on {7^} and not {c^}, this V v will control the average J over 
our ensemble. Of course, V v [{7*}] is still a product distribution, Jl^Pi (71), 
since no correlations between sites are assumed. But, unlike previous stud- 
ies, the 7's here must be chosen from site- dependent distributions - thus 
the subscript i on pi. To clarify, let us return to our illustration, in which 

^IWAMS [{^}] = Il 5 i=iPi (li) with , explicitly, 

Pi (7) = S (7 - 0.421) 
P2 (7) = 5 (7 -0.171) 

P3 ( 7 ) = {5 (7 - 0.122) + 38 (7 - 0.695)}/ 4 
Pi (7) = S (7 - 0.477) 

p 5 ( 7 ) = {8 (7 - 0.137) + 26 (7 - 0.193) + 6 (7 - 0.251) 
+(5 (7 - 0.300) + 6 (7 - 0.388)}/ 6 



Since the sequence of pi's are fixed by the amino acid sequence, {aa^}, we 
are constrained by a quenched distribution of different p's. Thus, we arrive 
at the notion of a quenched "distribution of distributions." 

With this framework in mind, we can define another average of J, asso- 
ciated with all possible ways of producing protein v in our ensemble of M„ 
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(equally probable) silent mutations: 

< J>«= J v 1 j{{ ll })v v IH}} 



where 2>y denotes Yif=i ^Hi- In a similar vein, -fCi2, m in also depends only on 
{ji} (and not {cj}), so that 

< K 12 ,mm >u= Jv~f Xl2,min ({7J) V v [{%}} 

The simulations for each v in Fig. 6 are a 5K-point sampling of this V v . Thus, 
the coordinates of the "center of mass" of the (roughly elliptical) cluster are 
just < J > v and < i£i2,min >v Of course, we can consider other quantities 
of interest, such as < S (J — J ({71})) corresponding to the histograms in 
Fig. 7. Other obvious possibilities are the second moments, which will provide 
us with the two axes and the orientation of each cluster, as well as a measure 
of the J-K correlation. Here, we are content to focus only the averages and 
their ratios, < J > v / < i^i2,min >v, for these 10 proteins. Remarkably, 
though both < J > v and < K >„ range by 25% (over the 10 i/'s), this ratio 
is essentially constant! This observation motivates us to define A in (jlip by 
a further average: 



A=— ^ <J>L 



10 ^ < K 12 .min >v 
v 

From our data, we find 

A = 0.0656 



which is, interestingly, comparable to (1 + v 12) = 0.0502. As a test of its 
"predictive power," we computed ^4i^i2,min for all 5000 x 10 {7}'s and com- 
pared them to the values of the currents obtained from simulations. Specifi- 
cally, the average of (these 50K values of) AK 12, mm/ J is within 0.4% of unity, 
while the standard deviation is about 5%. Rarely does this ratio range more 
than 15% from unity. In this sense, we are hopeful that, when we extend this 
study to the other 5406 genes in E.coli, we will confirm AKi2, m i n ({ji}) as a 
simple and reliable estimate for J({7i})- 

Despite being quite involved and extensive, this study has answered few 
questions in biology. Though it remains far from the goal of understanding 
protein synthesis in real cells, it does pose rich new ground for exploring 
nonequilibrium statistical mechanics. The main progress here is that, should 
we wish to design silent mutations of genes that could outperform the wild 
type, either by enhancing or suppressing the production rates of this protein, 
a reliable and simple method is available to facilitate the search for our goals. 
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5 Conclusions and outlook 

In this article, we have touched upon two fundamental issues in two very 
different fields: understanding nonequilibrium steady states and developing 
quantitative models for protein production. These two seemingly disparate 
problems converge in a simple one-dimensional transport model, the totally 
asymmetric simple exclusion process and its modifications. The TASEP is a 
paradigmatic far-from-equilibrium model, characterized by open boundaries 
and a systematic particle current through the system. Due to the exclusion, 
the particles are interacting, and so it is highly non-trivial to find steady- 
state and dynamic properties. Still, a considerable body of exact results is 
available for the standard TASEP. In particular, despite the one-dimensional 
nature of the model, it displays three distinct phases, separated by first order 
and continuous transitions. If the model is modified to include extended 
particles and inhomogeneous hopping rates, it is generally accessible only via 
simulations or approximate (mean-field) methods. 

With these modifications, the model becomes a more realistic - but still 
highly simplified - description of protein synthesis. The one-dimensional lat- 
tice models the mRNA template, with sites and extended particles repre- 
senting codons and ribosomes, respectively. Further, we allow non-uniform 
hopping rates, to reflect the variability of the aa-tRNA concentrations as- 
sociated with different codons. The particle current through the TASEP is 
simply the protein production rate. An interesting feature of translation is 
the sophisticated degeneracy: 61 codons code for 20 amino acids (mediated 
by 46 tRNAs in E.coli). In other words, there are many distinct sequences 
("silent mutations") which code for the same protein but are characterized 
by different production rates. 

In this article, we presented a brief introduction to the main findings for 
TASEP and the basics of protein synthesis, designed with non-experts in 
mind. We also described the modeling of translation in terms of a gener- 
alized TASEP, summarizing both well-established and more recent results. 
Amongst the latter we discussed two specific topics: first, the effects of limited 
availability of particles, and second, simple but remarkably good estimates 
for currents in the fully inhomogeneous case. The first project is motivated 
by the observation that ribosomes are large molecules so that their synthesis 
is costly for the cell. Hence, it is reasonable to expect them to be in limited 
supply. Considering only the simplest case - fully uniform rates and parti- 
cles covering only one site - we asked: How are currents and density profiles 
affected if a single, or several, TASEPs compete for particles from a finite 
reservoir? Remarkable results, such as multiple, distinct regimes in density 
profiles and shock localization were discovered. The second discussion cen- 
tered on two questions: Is it possible to arrive at simple yet reliable estimates 
for currents associated with fully inhomogeneous sequences? And how do the 
currents associated with the "ensemble" of silent mutations compare to that 
of the wild type? The answer to the first question relies on computing the 
typical time any particular codon is covered by a ribosome. In the language 
of TASEP, this is the time it takes a particle to traverse a stretch of 12 sites 
around a given site. This quantity can be determined from sequence informa- 
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tion with minimal effort (provided the aa-tRNA concentrations are known). 
Its inverse is effectively a coarse-grained rate associated with hopping from 
that site. It turns out that the lowest of these rates (in a given sequence), 
denoted by -ftTi2,min, provides a good estimate for the average current. Specif- 
ically, Monte Carlo simulations for 5000 randomly selected silent mutations 
of 10 different proteins show a reliable linear relation between currents and 
^i2,min's, with a proportionality constant that appears to be the same for all 
the proteins studied. Moreover, the current and K i2, m in of the wild type also 
obey this linear relation even though both fall well above the typical values 
for randomly chosen sequences. 

Clearly, the explorations reported here leave many questions unanswered, 
both on the statistical physics and the biology side. We just cite a few which 
will hopefully spark future research. The central fundamental question con- 
cerns the "stability" of steady-state properties with respect to model mod- 
ifications. Which changes of microscopic model details (e.g., hopping rates) 
will lead to changes of microscopic or macroscopic behaviors? While notions 
of universality and independence from certain dynamic details are well un- 
derstood for equilibrium systems, we have taken only initial steps towards 
extending them to nonequilibrium steady states [99,100 . Further, little if 
anything is known about how these general concepts apply to specific mod- 
els. On the quantitative biology side, even relatively simple questions remain 
open: Are aa-tRNA concentrations really the limiting factor for protein pro- 
duction rates? Are there other intrinsic rates, or is initiation the critical 
bottle neck? Secondary structures are known to be important |64| . but how 
exactly do they affect production rates? Why are the currents of wild type 
genes so optimized? Clearly, fundamental insights and close collaborations 
between physicists and biologists are needed before we will begin to under- 
stand biological processes - which are generically far from equilibrium - at 
a quantitative level. 
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